Skip to content

Conversation

Noratrieb
Copy link
Member

Because it is not really helpful to have downstream instantiations that we won't inline anyways (*)

* This is not quite true when #[inline] is being used as effectively #[likely_unused]

Let's check the performance impact of just doing this without any more thinking about anything

Because it is not really helpful to have downstream instantiations that
we won't inline anyways (`*`)

`*` This is not quite true when `#[inline]` is being used as effectively
`#[likely_unused]`
@rustbot
Copy link
Collaborator

rustbot commented Oct 4, 2025

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 4, 2025
@rustbot
Copy link
Collaborator

rustbot commented Oct 4, 2025

r? @fee1-dead

rustbot has assigned @fee1-dead.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@Noratrieb
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Oct 4, 2025
…=<try>

Avoid `LocalCopy` instantiation for `#[inline]` on `-Copt-level=0`
@rust-bors

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 4, 2025
@Noratrieb Noratrieb marked this pull request as draft October 4, 2025 20:30
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 4, 2025
@rust-log-analyzer
Copy link
Collaborator

The job aarch64-gnu-llvm-20-1 failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
---- [ui] tests/ui/simd/intrinsic/inlining-issue67557-ice.rs stdout ----

error: test compilation failed although it shouldn't!
status: exit status: 101
command: env -u RUSTC_LOG_COLOR RUSTC_ICE="0" RUST_BACKTRACE="short" "/checkout/obj/build/aarch64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/tests/ui/simd/intrinsic/inlining-issue67557-ice.rs" "-Zthreads=1" "-Zsimulate-remapped-rust-src-base=/rustc/FAKE_PREFIX" "-Ztranslate-remapped-path-to-local-path=no" "-Z" "ignore-directory-in-diagnostics-source-blocks=/cargo" "-Z" "ignore-directory-in-diagnostics-source-blocks=/checkout/vendor" "--sysroot" "/checkout/obj/build/aarch64-unknown-linux-gnu/stage2" "--target=aarch64-unknown-linux-gnu" "--check-cfg" "cfg(test,FALSE)" "--error-format" "json" "--json" "future-incompat" "-Ccodegen-units=1" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-Zwrite-long-types-to-disk=no" "-Cstrip=debuginfo" "-C" "prefer-dynamic" "-o" "/checkout/obj/build/aarch64-unknown-linux-gnu/test/ui/simd/intrinsic/inlining-issue67557-ice/a" "-A" "internal_features" "-A" "unused_parens" "-A" "unused_braces" "-Crpath" "-Cdebuginfo=0" "-Lnative=/checkout/obj/build/aarch64-unknown-linux-gnu/native/rust-test-helpers" "-Zmir-opt-level=4"
stdout: none
--- stderr -------------------------------
##[error]error: internal compiler error: compiler/rustc_mir_transform/src/validate.rs:80:25: broken MIR in Item(DefId(0:21 ~ inlining_issue67557_ice[6d41]::{impl#2}::eq)) (after phase change to runtime-optimized) at bb0[0]:
                                Projecting into SIMD type Simd2 is banned by MCP#838
  --> /checkout/tests/ui/simd/intrinsic/inlining-issue67557-ice.rs:12:14
   |
LL | #[derive(Debug, PartialEq)]
   |                 --------- in this derive macro expansion
LL | struct Simd2([u8; 2]);
   |              ^^^^^^^


thread 'rustc' (150276) panicked at compiler/rustc_mir_transform/src/validate.rs:80:25:
Box<dyn Any>
stack backtrace:
   0: std::panicking::begin_panic::<rustc_errors::ExplicitBug>
   1: <rustc_errors::diagnostic::BugAbort as rustc_errors::diagnostic::EmissionGuarantee>::emit_producing_guarantee
   2: <rustc_errors::DiagCtxtHandle>::span_bug::<rustc_span::span_encoding::Span, alloc::string::String>
   3: rustc_middle::util::bug::opt_span_bug_fmt::<rustc_span::span_encoding::Span>::{closure#0}
   4: rustc_middle::ty::context::tls::with_opt::<rustc_middle::util::bug::opt_span_bug_fmt<rustc_span::span_encoding::Span>::{closure#0}, !>::{closure#0}
   5: rustc_middle::ty::context::tls::with_context_opt::<rustc_middle::ty::context::tls::with_opt<rustc_middle::util::bug::opt_span_bug_fmt<rustc_span::span_encoding::Span>::{closure#0}, !>::{closure#0}, !>
   6: rustc_middle::util::bug::span_bug_fmt::<rustc_span::span_encoding::Span>
   7: <rustc_mir_transform::validate::CfgChecker>::fail::<alloc::string::String>
   8: <rustc_mir_transform::validate::Validator as rustc_mir_transform::pass_manager::MirPass>::run_pass
   9: rustc_mir_transform::pass_manager::run_passes_inner
  10: rustc_mir_transform::run_optimization_passes
  11: rustc_mir_transform::optimized_mir
      [... omitted 3 frames ...]
  12: rustc_mir_transform::cross_crate_inline::cross_crate_inlinable
---
  23: rustc_codegen_ssa::base::codegen_crate::<rustc_codegen_llvm::LlvmCodegenBackend>
  24: <rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::backend::CodegenBackend>::codegen_crate
  25: rustc_interface::passes::start_codegen
  26: <rustc_interface::queries::Linker>::codegen_and_build_linker
  27: <std::thread::local::LocalKey<core::cell::Cell<*const ()>>>::with::<rustc_middle::ty::context::tls::enter_context<<rustc_middle::ty::context::GlobalCtxt>::enter<rustc_interface::passes::create_and_enter_global_ctxt<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>::{closure#2}::{closure#0}, core::option::Option<rustc_interface::queries::Linker>>::{closure#1}, core::option::Option<rustc_interface::queries::Linker>>::{closure#0}, core::option::Option<rustc_interface::queries::Linker>>
  28: <rustc_middle::ty::context::TyCtxt>::create_global_ctxt::<core::option::Option<rustc_interface::queries::Linker>, rustc_interface::passes::create_and_enter_global_ctxt<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>::{closure#2}::{closure#0}>
  29: <rustc_interface::passes::create_and_enter_global_ctxt<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>::{closure#2} as core::ops::function::FnOnce<(&rustc_session::session::Session, rustc_middle::ty::context::CurrentGcx, alloc::sync::Arc<rustc_data_structures::jobserver::Proxy>, &std::sync::once_lock::OnceLock<rustc_middle::ty::context::GlobalCtxt>, &rustc_data_structures::sync::worker_local::WorkerLocal<rustc_middle::arena::Arena>, &rustc_data_structures::sync::worker_local::WorkerLocal<rustc_hir::Arena>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2})>>::call_once::{shim:vtable#0}
  30: <alloc::boxed::Box<dyn for<'a> core::ops::function::FnOnce<(&'a rustc_session::session::Session, rustc_middle::ty::context::CurrentGcx, alloc::sync::Arc<rustc_data_structures::jobserver::Proxy>, &'a std::sync::once_lock::OnceLock<rustc_middle::ty::context::GlobalCtxt<'a>>, &'a rustc_data_structures::sync::worker_local::WorkerLocal<rustc_middle::arena::Arena<'a>>, &'a rustc_data_structures::sync::worker_local::WorkerLocal<rustc_hir::Arena<'a>>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}), Output = core::option::Option<rustc_interface::queries::Linker>>> as core::ops::function::FnOnce<(&rustc_session::session::Session, rustc_middle::ty::context::CurrentGcx, alloc::sync::Arc<rustc_data_structures::jobserver::Proxy>, &std::sync::once_lock::OnceLock<rustc_middle::ty::context::GlobalCtxt>, &rustc_data_structures::sync::worker_local::WorkerLocal<rustc_middle::arena::Arena>, &rustc_data_structures::sync::worker_local::WorkerLocal<rustc_hir::Arena>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2})>>::call_once
  31: rustc_interface::passes::create_and_enter_global_ctxt::<core::option::Option<rustc_interface::queries::Linker>, rustc_driver_impl::run_compiler::{closure#0}::{closure#2}>
  32: <scoped_tls::ScopedKey<rustc_span::SessionGlobals>>::set::<rustc_interface::util::run_in_thread_with_globals<rustc_interface::util::run_in_thread_pool_with_globals<rustc_interface::interface::run_compiler<(), rustc_driver_impl::run_compiler::{closure#0}>::{closure#1}, ()>::{closure#0}, ()>::{closure#0}::{closure#0}::{closure#0}, ()>
  33: rustc_span::create_session_globals_then::<(), rustc_interface::util::run_in_thread_with_globals<rustc_interface::util::run_in_thread_pool_with_globals<rustc_interface::interface::run_compiler<(), rustc_driver_impl::run_compiler::{closure#0}>::{closure#1}, ()>::{closure#0}, ()>::{closure#0}::{closure#0}::{closure#0}>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

note: using internal features is not supported and expected to cause internal compiler errors when used incorrectly

note: rustc 1.92.0-nightly (61fea9370 2025-10-04) running on aarch64-unknown-linux-gnu

note: compiler flags: -Z threads=1 -Z simulate-remapped-rust-src-base=/rustc/FAKE_PREFIX -Z translate-remapped-path-to-local-path=no -Z ignore-directory-in-diagnostics-source-blocks=/cargo -Z ignore-directory-in-diagnostics-source-blocks=/checkout/vendor -C codegen-units=1 -Z ui-testing -Z deduplicate-diagnostics=no -Z write-long-types-to-disk=no -C strip=debuginfo -C prefer-dynamic -C rpath -C debuginfo=0 -Z mir-opt-level=4

query stack during panic:
#0 [optimized_mir] optimizing MIR for `<impl at /checkout/tests/ui/simd/intrinsic/inlining-issue67557-ice.rs:11:17: 11:26>::eq`
#1 [cross_crate_inlinable] whether the item should be made inlinable across crates
#2 [reachable_set] reachability
#3 [reachable_non_generics] looking up the exported symbols of a crate
#4 [is_reachable_non_generic] checking whether `inline_me` is an exported symbol
#5 [collect_and_partition_mono_items] collect_and_partition_mono_items
end of query stack
error: aborting due to 1 previous error
------------------------------------------

---- [ui] tests/ui/simd/intrinsic/inlining-issue67557-ice.rs stdout end ----

@rust-bors
Copy link

rust-bors bot commented Oct 4, 2025

☀️ Try build successful (CI)
Build commit: 85bd5e8 (85bd5e831a7047f663636aa92624a501051dbb74, parent: 99ca0ae87ba5571acee116ea83d1f9e88a7bf8d8)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (85bd5e8): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
23.3% [0.1%, 897.4%] 48
Regressions ❌
(secondary)
2.7% [0.1%, 32.6%] 43
Improvements ✅
(primary)
-1.4% [-4.6%, -0.2%] 20
Improvements ✅
(secondary)
-0.1% [-0.1%, -0.1%] 2
All ❌✅ (primary) 16.0% [-4.6%, 897.4%] 68

Max RSS (memory usage)

Results (primary 2.0%, secondary 0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
4.4% [0.8%, 15.6%] 24
Regressions ❌
(secondary)
5.8% [1.4%, 12.4%] 11
Improvements ✅
(primary)
-2.7% [-6.8%, -0.8%] 12
Improvements ✅
(secondary)
-4.4% [-6.9%, -1.8%] 14
All ❌✅ (primary) 2.0% [-6.8%, 15.6%] 36

Cycles

Results (primary 52.9%, secondary 5.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
61.9% [2.3%, 923.3%] 19
Regressions ❌
(secondary)
8.2% [2.0%, 26.5%] 14
Improvements ✅
(primary)
-4.0% [-4.8%, -3.3%] 3
Improvements ✅
(secondary)
-3.7% [-6.6%, -1.1%] 5
All ❌✅ (primary) 52.9% [-4.8%, 923.3%] 22

Binary size

Results (primary 7.6%, secondary 14.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
12.6% [1.2%, 67.5%] 55
Regressions ❌
(secondary)
15.3% [1.3%, 78.2%] 27
Improvements ✅
(primary)
-1.9% [-9.1%, -0.2%] 29
Improvements ✅
(secondary)
-0.2% [-0.4%, -0.0%] 2
All ❌✅ (primary) 7.6% [-9.1%, 67.5%] 84

Bootstrap: 471.44s -> 471.725s (0.06%)
Artifact size: 388.29 MiB -> 387.96 MiB (-0.09%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Oct 4, 2025
@saethlin
Copy link
Member

saethlin commented Oct 5, 2025

The change improves full builds of binaries. We don't have very many of those in the benchmark suite.

The 897% change is an incr-patched benchmark, you probably angered the CGU merging gods and need to do the strategy I explained in this PR: #145910. There are some other libraries that regress, and those are pretty easy to hand-wave away as needing to codegen more items.

I think the rest of the changes which seem like relatively random perturbations to library build times, are just because we use instantiation modes as an unprincipled caching system that crate authors are not actively trying to take advantage of, so if you perturb the rules for what gets LocalCopy, some crates benefit and some regress.

@Noratrieb
Copy link
Member Author

We also need to address how this likely undoes some of the wins from #117727 (comment)

@bjorn3
Copy link
Member

bjorn3 commented Oct 5, 2025

stdarch depends on #[inline] skipping local codegen on wasm. Otherwise the wasm runtime would need to support simd128 and relaxed-simd instructions to run any wasm modules where the respective simd intrinsics aren't GCed.

@saethlin
Copy link
Member

saethlin commented Oct 5, 2025

stdarch should be using an attribute that's designed for that purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

perf-regression Performance regression. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants